Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue
نویسندگان
چکیده
The use of irony and sarcasm in social media allows us to study them at scale for the first time. However, their diversity has made it difficult to construct a high-quality corpus of sarcasm in dialogue. Here, we describe the process of creating a largescale, highly-diverse corpus of online debate forums dialogue, and our novel methods for operationalizing classes of sarcasm in the form of rhetorical questions and hyperbole. We show that we can use lexico-syntactic cues to reliably retrieve sarcastic utterances with high accuracy. To demonstrate the properties and quality of our corpus, we conduct supervised learning experiments with simple features, and show that we achieve both higher precision and F than previous work on sarcasm in debate forums dialogue. We apply a weakly-supervised linguistic pattern learner and qualitatively analyze the linguistic differences in each class.
منابع مشابه
"yeah Right": Sarcasm Recognition for Spoken Dialogue Systems
The robust understanding of sarcasm in a spoken dialogue system requires a reformulation of the dialogue manager’s basic assumptions behind, for example, user behavior and grounding strategies. But automatically detecting a sarcastic tone of voice is not a simple matter. This paper presents some experiments toward sarcasm recognition using prosodic, spectral, and contextual cues. Our results de...
متن کاملHarnessing Sequence Labeling for Sarcasm Detection in Dialogue from TV Series 'Friends'
This paper is a novel study that views sarcasm detection in dialogue as a sequence labeling task, where a dialogue is made up of a sequence of utterances. We create a manuallylabeled dataset of dialogue from TV series ‘Friends’ annotated with sarcasm. Our goal is to predict sarcasm in each utterance, using sequential nature of a scene. We show performance gain using sequence labeling as compare...
متن کاملA Large Self-Annotated Corpus for Sarcasm
We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection. The corpus has 1.3 million sarcastic statements — 10 times more than any previous dataset — and many times more instances of non-sarcastic statements, allowing for learning in both balanced and unbalanced label regimes. Each statement is furthe...
متن کاملIrony and Sarcasm: Corpus Generation and Analysis Using Crowdsourcing
The ability to reliably identify sarcasm and irony in text can improve the performance of many Natural Language Processing (NLP) systems including summarization, sentiment analysis, etc. The existing sarcasm detection systems have focused on identifying sarcasm on a sentence level or for a specific phrase. However, often it is impossible to identify a sentence containing sarcasm without knowing...
متن کاملAn Improved Method for Detection of Satire from User-Generated Content
Sarcasm is a form of speech act in which the speakers convey their message in an implicit way. It is a sophisticated form of speech act widely used in online communities. The inherently ambiguous nature of sarcasm sometimes makes it hard even for humans to decide whether an utterance is sarcastic in nature or not. Recognition of sarcasm may anticipate benefits in many sentiment analysis of NLP ...
متن کامل